hw1
desriptive statistics
probability
The first homework on descriptive statistics and probability
Author

Yakub Rabiutheen

Published

September 20, 2022

Question 1

a

First, let’s read in the data from the Excel file:

Code
library(readxl)
df <- read_excel("_data/LungCapData.xls")

The distribution of LungCap looks as follows:

Code
hist(df$LungCap,freq = FALSE)

The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).

b

Comparison of the Genders for both Men and Women using a Boxplot.

Code
boxplot(df$LungCap ~ df$Gender)

c

Here is the capacity of Smokers vs Non-Smokers

Code
boxplot(df$LungCap~df$Smoke,
        ylab = "Capacity", 
        main = "Lung Capacity of Smokers Vs Non-Smokers",
        las = 1)

d

Let’s break it down even further, this is the Lung Capacity by Age Group

Code
df$Agegroups<-cut(df$Age,breaks=c(-Inf, 13, 15, 17, 20), labels=c("0-13 years", "14-15 years", "16-17 years", "18+ years"))

Below is the overall Lung Capacity of Age Groups without including Smokers.

Code
library(ggplot2)
ggplot(df, aes(x = LungCap, y = Agegroups, fill = Gender)) +
          geom_bar(stat = "identity") +
          coord_flip() +
          theme_classic()

#e

Here is a comparision of AgeGroup Lung Capacity in comparison with Smoker vs Non-Smoker.

Code
ggplot(df, aes(x = LungCap, y = Agegroups, fill = Smoke)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_classic()

1f

Based on the comparison of lung capacities between Smoker and Non-Smoker the results are pretty similar.

Code
cov(df$LungCap, df$Age)
[1] 8.738289
Code
cor(df$LungCap, df$Age)
[1] 0.8196749

Question 2

Code
X <- c(0:4)
Frequency <- c(128, 434, 160, 64, 24)
df <- data.frame(X, Frequency)
df
  X Frequency
1 0       128
2 1       434
3 2       160
4 3        64
5 4        24

As shown below, the most common Prior Convictions is 1.

Code
df
  X Frequency
1 0       128
2 1       434
3 2       160
4 3        64
5 4        24

Dividing by the total among 810 we can determine the probability for each. 810 is the Sum of the Frequency which I checked manually.

Code
df2 <- mutate(df, Probability = Frequency/sum(Frequency))
Error in mutate(df, Probability = Frequency/sum(Frequency)): could not find function "mutate"
Code
df2
Error in eval(expr, envir, enclos): object 'df2' not found
  1. Filter for Probability of 2 Convictions
Code
b2 <- df2 %>% 
  filter(X < 2)
Error in df2 %>% filter(X < 2): could not find function "%>%"
Code
sum(b2$Probability)
Error in eval(expr, envir, enclos): object 'b2' not found
  1. Filter for Probability of Less than 2 Convictions
Code
c2 <- df2 %>% 
  filter(X <= 2)
Error in df2 %>% filter(X <= 2): could not find function "%>%"
Code
sum(c2$Probability)
Error in eval(expr, envir, enclos): object 'c2' not found

Filter for Probability of greater than 2 convictions.

Code
d2 <- df2 %>% 
  filter(X > 2)
Error in df2 %>% filter(X > 2): could not find function "%>%"
Code
sum(d2$Probability)
Error in eval(expr, envir, enclos): object 'd2' not found

What is the expected value of the number of prior convictions?

Code
e <- weighted.mean(df2$X, df2$Probability)
Error in weighted.mean(df2$X, df2$Probability): object 'df2' not found
Code
e
Error in eval(expr, envir, enclos): object 'e' not found

Variance and Standard Deviation for Question.

Code
var(df$X)
[1] 2.5
Code
sd(df$X)
[1] 1.581139